Towards Best Practice for Multiword Expressions in Computational Lexicons
نویسندگان
چکیده
The importance and role of multi-word expressions (MWE) in the description and processing of natural language has been long recognized. However, multi-word information has often been relegated to the marginal role of idiosyncratic lexical information. The need for MWE lexicons grows even more acute for multi-lingual applications, for which (sometimes complex) correspondences must be identified, classified, and recorded. Within the XMELLT and ISLE projects we have started to investigate the potential to develop multi-lingual, multi-word expression lexicons incorporating both syntactic and semantic information. We aim at specifying means to acquire and represent multi-word lexical entries for multiple languages, and establishing uniform (or inter-translatable) standards for describing multi-word lexical entries. We explored theoretical approaches used in large lexicon-building projects, in particular FrameNet and SIMPLE. They constitute interesting frameworks for the explicit syntactic and semantic representation of MWEs, due mainly to their ability to capture semantic multidimensionality, through frame elements and qualia relations respectively. We also developed an abstract data model for lexical information together with a representation in XML for it. Our goal is to define a set of minimal lexicon “objects”, which can serve not only as a model for MWEs but also for lexical data in general.
منابع مشابه
Achieving Adequacy of Description of Multiword Entities in Semantically-Oriented Computational Lexicons
This article discusses three aspects of recording multiword expressions (MWEs) in semantically oriented lexicons for NLP: achieving syntactic adequacy , achieving semantic adequacy, and computing the semantic contribution of non-compositional elements. The purpose of the analysis is twofold: first, to provide a descriptive, example-based account of how complex aspects of MWEs can be treated in ...
متن کاملMultiword Expressions: A Pain in the Neck for NLP
Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing technology. This paper surveys the problem and some currently available analytic techniques. The various kinds of multiword expressions should be analyzed in distinct ways, including listing “words with spaces”, hierarchically organized lexicons, restricted combinatoric r...
متن کاملMULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey
Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...
متن کاملLex Ical R Epr Esentation of M Ultiw or D Ex Pr Essions in M or Ph Ologically -com Plex Languages
In spite of the surging interest in multiword expressions (M WE s) in recent years, it is still unclear how such expressions should be stored in computational lexicons. This problem is amplified in morphologically-complex languages, where the unique properties of M WE s interact with non-trivial morphological processes. We propose an architecture for lexical representation of M WE s, augmented ...
متن کاملBuilding Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System
We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...
متن کامل